Selective Pre-processing of Imbalanced Data for Improving Classification Performance
نویسندگان
چکیده
The paper discusses problems of constructing classifiers from imbalanced data. Re-sampling approaches that change the original class distribution are often used to improve performance of classifiers for the minority class. We describe a new approach to selective pre-processing of imbalanced data which combines local over-sampling of the minority class with filtering difficult examples from majority classes. In experiments focused on rule-based and tree-based classifiers we compare our approach with two other well known preprocessing methods – NCR and SMOTE. The results show that NCR is too strongly biased towards the minority class and leads to deteriorated specificity and overall accuracy, while SMOTE and our approach do not demonstrate such behavior. Analysis of a degree to which the original class distribution has been modified also reveals that our approach does not introduce so extensive changes
منابع مشابه
Improving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering
Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...
متن کاملImproving Rule-Based Classifiers Induced by MODLEM by Selective Pre-processing of Imbalanced Data
In the paper we discuss inducing rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining classes (majority classes). To improve the ability of a classifier to recognize this class, we propose a new selective pre-processing approach that is applied to data before inducing a rule-based classifier. The approach combines se...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملIntegrating Selective Pre-processing of Imbalanced Data with Ivotes Ensemble
In the paper we present a new framework for improving classifiers learned from imbalanced data. This framework integrates the SPIDER method for selective data pre-processing with the Ivotes ensemble. The goal of such integration is to obtain improved balance between the sensitivity and specificity for the minority class in comparison to a single classifier combined with SPIDER, and to keep over...
متن کاملClassifying imbalanced data sets using similarity based hierarchical decomposition
Classification of data is difficult if the data is imbalanced and classes are overlapping. In recent years, more research has started to focus on classification of imbalanced data since real world data is often skewed. Traditional methods are more successful with classifying the class that has the most samples (majority class) compared to the other classes (minority classes). For the classifica...
متن کامل